Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Arthur Gretton

E3S

Variational Learning for Insertion-based Generation

Jun 01, 2026

Yangtian Zhang, Zhe Wang, Arthur Gretton, Rex Ying, David van Dijk, Michalis K. Titsias, Jiaxin Shi

Abstract:Non-monotonic sequence generation methods, such as masked diffusion models, provide a flexible alternative to left-to-right autoregressive modeling by allowing tokens to be generated in non-fixed and prescribed orders. Despite their practical advantages, most existing non-monotonic models are order-agnostic and rely on a fixed-length grid, limiting their ability to support variable-length generation and adaptive insertion order. In this work, we introduce a probabilistic framework for learning insertion order in variable-length insertion models. We formalize a bijective correspondence between insertion trajectories and permutations, which enables an exact reparameterization of the data likelihood as a sum over permutations. Building on this result, we propose the Insertion Process (IP), a stochastic generative model that jointly learns where to insert, what to insert, and when to terminate, trained via permutation-based variational inference. Unlike prior fixed-canvas approaches, IP natively supports variable-length generation and learns data-driven preferences over insertion orders. Experiments on goal-conditioned planning and molecular string generation demonstrate that learning insertion order improves both modeling quality and generalization in domains without a canonical left-to-right structure.

Via

Access Paper or Ask Questions

Perturbative methods for non-parametric instrumental variable

May 29, 2026

Wei Bu, Arthur Gretton

Abstract:We introduce a perturbative approach for nonparametric instrumental variable (NPIV) estimation. By drawing inspiration from perturbation theory in physics, we extend standard kernel ridge methods with systematic higher perturbation order corrections that significantly improve estimation accuracy. Spectrally, the perturbation introduces mixing between different eigenmodes of the expectation integral operator, which becomes especially useful when the integral equation is ill-defined. One source for such ill-definedness can be the curse of dimensionality. Our method performs across various dimensionality regimes, particularly when the dimensionality parameter $β$ which is defined through the number of samples $n$ and dimension $d$ as $n^β= d$, becomes large. Experimental results show that our first-order perturbative corrections can reduce prediction error by up to 99\% in high-dimensional ill-defined cases ($β> 0.7$) compared to standard ridge regression approaches. The performance improvement is maintained across a wide range of dimensions, with the advantage becoming more pronounced as dimensionality increases.

* 8+24 pages, 4 figures, comments welcomed

Via

Access Paper or Ask Questions

Semiparametrically Efficient Inference for Kernel Measures of Noise Heterogeneity

May 26, 2026

Jakub Wornbard, Zikai Shen, Dimitri Meunier, Arthur Gretton

Abstract:We develop semiparametrically efficient inference for kernel measures of noise heterogeneity in additive noise models. In many applications, the regression function is estimated using flexible machine learning methods. Downstream procedures based on the resulting residuals can then inherit first-stage bias: regression error may induce spurious dependence between covariates and residuals, invalidating the assumptions needed for standard analysis. We construct a novel Hilbert-valued one-step estimator of the kernel covariance operator between covariates and residuals. Our estimator yields bootstrap-calibrated tests for residual independence and goodness of fit in additive noise models, while also providing asymptotically efficient confidence intervals for the kernel dependence measure under noise heterogeneity. The framework extends to settings with additional covariates, enabling inference on distributional heterogeneity of residual noise across treatment groups. Simulations show improved calibration and power relative to naive plug-in residual methods.

Via

Access Paper or Ask Questions

The Illusion of Intervention: Your LLM-Simulated Experiment is an Observational Study

May 20, 2026

Victoria Lin, Taedong Yun, Maja Matarić, John Canny, Arthur Gretton, Alexander D'Amour

Abstract:Large language models (LLMs) show potential as simulators of human behavior, offering a scalable way to study responses to interventions. However, because LLMs are trained largely on observational data, interventions in experiments with LLM-simulated synthetic users can induce unintended shifts in latent user attributes, causing user drift where the implicit simulated population differs across treatment conditions, potentially distorting effect estimates. We formalize the confounding or selection bias that can arise due to user drift and show how intervention-dependent shifts can inflate or attenuate observed differences in user responses under intervention. To diagnose confounding, we propose using negative control outcomes--attributes that should remain invariant under intervention--to identify distribution shifts across intervention conditions, providing evidence of user drift. To mitigate drift, we study adjusting the persona specification by eliciting additional confounders, finding that targeted, setting-relevant confounders can substantially reduce bias across survey-style and multi-turn agent evaluations.

Via

Access Paper or Ask Questions

Spectral Souping: A Unified Framework for Online Preference Alignment

May 19, 2026

Yinlam Chow, Guy Tennenholtz, Ted Yun, James Harrison, Arthur Gretton, Andre Barreto, Bo Dai

Abstract:Reinforcement Learning from Human Feedback (RLHF) effectively aligns Large Language Models (LLMs) with aggregate human preferences but often fails to address the diverse and conflicting needs of individual users. To overcome this issue, we introduce Spectral Souping, a unified framework for efficient, online preference alignment. Our contribution is the discovery of a universal spectral representation within LLMs, which is proven to be highly amenable to model merging. This theoretical insight enables a two-phase methodology: we first learn a basis of specialized policies offline, each focused on a distinct, fine-grained preference dimension. An online adaptation algorithm then efficiently ``soups'' these policies at inference time, either by merging their outputs or parameters, enabling rapid model adaptation without the need for costly online retraining w.r.t. tailored preference rewards. Experiments on online preference alignment benchmarks demonstrate that our method achieves significant performance improvements over existing state-of-the-art approaches, presenting a scalable and computationally efficient solution for dynamically adapting LLMs to individual user preferences.

Via

Access Paper or Ask Questions

Sobolev Regularized MMD Gradient Flow

May 12, 2026

Chenyang Tian, Bharath K. Sriperumbudur, Arthur Gretton, Zonghao Chen

Abstract:We propose Sobolev-regularized Maximum Mean Discrepancy (SrMMD) gradient flow, a regularized variant of maximum mean discrepancy (MMD) gradient flow based on a gradient penalty on the witness function. The proposed regularization mitigates the non-convexity of the MMD objective and yields provable \emph{global} convergence guarantees in MMD in both continuous and discrete time. A more surprising appeal is that our convergence analysis does not rely on isoperimetric assumptions on the target distribution. Instead, it is based on a regularity condition on the difference between kernel mean embeddings. A key highlight of the proposed flow is that it is applicable in both sampling (from an unnormalized target distribution) -- using Stein kernels -- and generative modeling settings, unlike previous works, where a gradient flow is suitable for only generative modeling or sampling but not both. The effectiveness of the proposed flow is empirically verified on a broad range of tasks in both generative modelling and sampling.

Via

Access Paper or Ask Questions

Doubly Robust Proxy Causal Learning with Neural Mean Embeddings

May 10, 2026

Bariscan Bozkurt, Alexandre Galashov, Dimitri Meunier, Zikai Shen, Arthur Gretton, Houssam Zenati

Abstract:Unobserved confounding prevents standard covariate adjustment from identifying causal response functions in observational studies. Proxy causal learning addresses this problem through bridge equations involving treatment- and outcome-inducing proxies, avoiding direct recovery of the latent confounder. Existing doubly robust proxy estimators combine outcome and treatment bridges, but typically rely on fixed kernels, sieves, or low-dimensional semiparametric models; existing neural proxy methods are more flexible, but are largely single-bridge estimators. We develop a neural doubly robust framework for proxy causal learning with continuous and structured treatments. Our method introduces a neural mean-embedding estimator for the treatment bridge, combines it with a neural outcome bridge, and estimates the doubly robust correction through a final regression stage. The framework covers population, heterogeneous, and conditional dose-response functions, yielding full response-curve estimators rather than binary-treatment effects. The algorithms use two stages for each bridge and history-aware updates of the final linear layers to stabilize stochastic multi-stage training. We prove consistency of the algorithms showing that the doubly robust error is controlled by the final averaging and regression errors together with the smaller of the outcome- and treatment-side weak-norm bridge errors. Across synthetic and image-valued benchmarks, the proposed estimators outperform existing baselines and single-bridge neural estimators, showing the benefit of combining learned outcome and treatment bridges in a doubly robust construction. Our implementation is available at https://github.com/BariscanBozkurt/DRPCL-Neural-Mean-Embedding.

Via

Access Paper or Ask Questions

On the Wasserstein Gradient Flow Interpretation of Drifting Models

May 06, 2026

Arthur Gretton, Li Kevin Wenliang, Alexandre Galashov, James Thornton, Valentin De Bortoli, Arnaud Doucet

Abstract:Recently, Deng et al. (2026) proposed Generative Modeling via Drifting (GMD), a novel framework for generative tasks. This note presents an analysis of GMD through the lens of Wasserstein Gradient Flows (WGF), i.e., the path of steepest descent for a functional in the space of probability measures, equipped with the geometry of optimal transport. Unlike previous WGF-based contributions, GMD can be thought of as directly targeting a fixed point of a specific WGF flow. We demonstrate three main results: first, that one algorithm proposed by Deng et al. (2026) corresponds to finding the limiting point of a WGF on the KL divergence, with Parzen smoothing on the densities. Second, that the algorithm actually implemented by Deng et al. (2026) corresponds to a different procedure, which bears some resemblance to the fixed point of a WGF on the Sinkhorn divergence, but lacks certain desirable properties of the latter. Third, the same same idea can be extended to the limiting point of other WGFs, including the Maximum Mean Discrepancy (MMD), the sliced Wasserstein distance, and GAN critic functions.

Via

Access Paper or Ask Questions

Instrumental Variable Analysis Without Structural Equations

Apr 27, 2026

Zikai Shen, Dimitri Meunier, Houssam Zenati, Arthur Gretton, Nathan Kallus, Aurélien Bibaut

Abstract:We consider debiased inference on least-squares solutions to inverse problems as a way to avoid having to assume exact solutions exist. Such assumptions are substantive and not innocuous and their failure may well imperil inference when we impose them on the statistical model. Our approach instead allows us to conduct inference on a quantity that is defined regardless of solutions existing and coincides with the usual estimands when they do. For the case of instrumental variables, this means we can motivate the analysis with structural models but these do not need to hold exactly for the inferential procedure to remain valid.

Via

Access Paper or Ask Questions

Efficient Inference after Directionally Stable Adaptive Experiments

Feb 25, 2026

Zikai Shen, Houssam Zenati, Nathan Kallus, Arthur Gretton, Koulik Khamaru, Aurélien Bibaut

Abstract:We study inference on scalar-valued pathwise differentiable targets after adaptive data collection, such as a bandit algorithm. We introduce a novel target-specific condition, directional stability, which is strictly weaker than previously imposed target-agnostic stability conditions. Under directional stability, we show that estimators that would have been efficient under i.i.d. data remain asymptotically normal and semiparametrically efficient when computed from adaptively collected trajectories. The canonical gradient has a martingale form, and directional stability guarantees stabilization of its predictable quadratic variation, enabling high-dimensional asymptotic normality. We characterize efficiency using a convolution theorem for the adaptive-data setting, and give a condition under which the one-step estimator attains the efficiency bound. We verify directional stability for LinUCB, yielding the first semiparametric efficiency guarantee for a regular scalar target under LinUCB sampling.

* 34 pages

Via

Access Paper or Ask Questions